Who is the genre FPGA vs. ASIC that artificial intelligence implements?

Artificial intelligence is on the rise, and numerous startups and established companies are actively developing intelligent hardware that uses artificial intelligence applications as a selling point. At present, powerful cloud artificial intelligence services (such as Google's Alpha Go) are beginning to emerge. At the same time, people also hope to bring artificial intelligence to mobile terminals, especially to integrate future Internet of Things applications.

Traditionally, the method of realizing artificial intelligence of a mobile terminal is to transmit the terminal data to the cloud through the network, and then send the result back to the mobile terminal after the cloud computing, such as Apple's Siri service.

However, there are several problems with this approach. First, there is a delay in using the network to transmit data. It is very likely that the result of the data calculation will take a few seconds or even tens of seconds to be sent back to the terminal (friends who have used the Prisma app to process photos should have a deep understanding). In this way, applications that need immediate results can't be used this way. For example, the deep learning evasion obstacle algorithm used on the drone, if it is all executed in the cloud, the calculation result has not been sent back and the drone has fallen.

Second, once data is transmitted over the network, there is a risk that the data will be hijacked. Therefore, applications that require low computational latency and are very sensitive to data security need to implement all of the artificial intelligence algorithms in the terminal, or at least perform some preprocessing operations on the terminal and then put a small number of computational results (instead of a large amount of raw data). Transfer to the cloud to complete the final calculation, which requires the mobile terminal hardware to complete these operations quickly. On the other hand, the mobile terminal hardware does not need too much energy to complete these operations, otherwise the battery will be out of power (I want to use the Nvidia Pascal graphics card with 200W+ power consumption on the mobile phone!).

At present, many companies are actively developing hardware that can realize mobile artificial intelligence. There are two major genres for implementing mobile intelligent hardware, namely FPGA and ASIC. The representative of the FPGA genre is the Zynq platform that Xilinx is the main push, and the representative company of the ASIC genre is Movidius. The two genres have their own lengths. Let me subdivide them.

FPGA vs. ASIC

Let's talk about the difference between FPGA and ASIC. FPGA is called "Field Programmable Gate Array". The basic principle is to integrate a large number of digital circuit basic gates and memory in the FPGA chip. Users can define these gates by burning in the FPGA configuration file. The connection between the memories. This burn-in is not a one-off, that is, the user can configure the FPGA as a microcontroller MCU today, and the configuration file can be edited tomorrow to configure the same FPGA as an audio codec. The ASIC is an application-specific integrated circuit (ApplicaTIon-Specific Integrated Circuit). Once the design is completed, the circuit is fixed and cannot be changed.

FPGA for deep learning accelerator (Xilinx Kintex 7 Ultrascle, top) and ASIC (Movidius Myriad 2, bottom)FPGA for deep learning accelerator (Xilinx Kintex 7 Ultrascle, top) and ASIC (Movidius Myriad 2, bottom)

FPGA for deep learning accelerator (Xilinx Kintex 7 Ultrascle, top) and ASIC (Movidius Myriad 2, bottom)

Comparing FPGAs and ASICs is like comparing LEGO bricks and models. For example, if you find that the master of Yoda in the recent Star Wars is very hot and wants to sell a toy of Yoda Master, what do you do?

There are two ways, one is to use LEGO bricks, and the other is to find the factory to open the mold customization. If you use Lego bricks, you can buy a set of Lego bricks after designing the shape of the toy. If you look for a factory to open a mold, you still need to do a lot of things in designing the shape of the toy. For example, whether the material of the toy will emit odor, whether the toy will melt at high temperature, etc., so the pre-work ratio required to use Lego bricks as a toy It takes much less to find a factory for mold making, and it takes a lot faster to get from LEGO to the time it takes to get to market.

The same is true for FPGAs and ASICs. Using FPGAs, you can implement hardware accelerators with the tools provided by FPGA vendors as long as you write Verilog code. To design ASICs, you need to do a lot of verification and physical design (ESD, Package, etc.). time. If you want to target special occasions (such as military and industrial applications that require high reliability), ASICs need more time to design specifically to meet the needs, but with FPGAs you can directly buy military-grade high-stability FPGAs. Affect development time. However, although the design time is relatively short, the toys made by Lego blocks are rougher (poor performance) than the factory-customized toys (below), after all, the factory opening is tailor-made.

In addition, if the shipment is large, the cost of mass production of toys in the factory will be much cheaper than using LEGO bricks. The same is true for FPGAs and ASICs. At the same time, the accelerator of the ASIC implemented with the best process will be 5-10 times faster than the accelerator with the same process FPGA, and the cost of the ASIC will be far after mass production. Lower than the FPGA solution (10 to 100 times cheaper).

FPGA vs ASIC: building blocks vs hands

FPGA vs ASIC: building blocks vs hands

Of course, another major feature of FPGAs is that they can be reconfigured at any time to implement different functions in different situations. However, when the accelerator implemented by the FPGA is sold to the user as a commodity, it takes a lot of effort to reconfigure the user.

Returning to the example of using Lego bricks as a toy, toy manufacturers can claim that the Yoda master is built up by bricks, so players can reassemble these blocks into other characters (such as Skywalker Luke). But what if the general player does not disassemble the building blocks at all? The solution is either to target the target market as a professional core player who is proficient in building blocks, or to add a switch to the back of the toy. Normally, the player can automatically reassemble the building blocks with a single click. Obviously, the second option requires a high technical threshold.

For FPGA accelerators, if you want to use reconfigurability as a selling point, or sell it to enterprise users who have the ability to develop FPGAs themselves (such as Baidu, Microsoft, etc., companies are actually developing FPGA-based deep learning accelerators and in different applications. Configuring the FPGA as a different accelerator), or developing a compiler that is easy to use and converts the user's deep learning network into an FPGA configuration file (a company such as Shenjian) is trying.

From the current point of view, it takes a few minutes to use a high-end server to do FPGA compilation. If the compilation is done on a mobile terminal with weak computing power, it takes longer. For mobile end users, how to convince them to try to reconfigure the FPGA and accept up to tens of minutes to compile the network and configure the FPGA is still a problem.

Interactive Whiteboard For Teaching

Interactive Whiteboard For Teaching,Smart White Board,Interactive Smart Whiteboard,Electronic Digital Portable Whiteboard

ALLIN , https://www.nbdisplayapio.com